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Abstract 

f 

During  l>/i’  doconipo^it  ion  of  a  sparse  matrix,  it  is  possible  to  perform  (om- 
piitatioji  on  tnany  diagonal  elements  simnlt aneonsly.  l’ivf)ls  that  can  be  j)ro- 
cessed  in  parallel  are  related  by  a  compatibility  relation  and  are  grouped  in  a 
<om|iatible  set.  The  collection  of  all  maximal  compatibles  yields  differemt 
maximum  sized  sets  of  pivots  that  can  be  processed  in  parallel.  Generation  of 
the  maximal  compatibles  is  based  on  the  information  obtained  from  an 
incomjiat il)le  table.  This  tabic  provides  information  about  pairs  of  incom|)a- 
tible  pivots.  In  this  paper,  generation  of  the  maximal  compatibles  f>f  pivot 
elements  for  a  class  of  small  sparse  matrices  is  studied  first.  The  algorithm 
involves  a  binary  tree  search  and  has  a  complexity  exponential  in  the  order  of 
the  matrix.  Different  strategies  for  selection  of  a  set  of  compali))le  pivots 
based  on  the  Markowitz  criterion  are' investigated.  J'he  competing  issues  of 
paralhdism  and  fill-in  generation  are  studied  and  results  are  provided.  A 
technifpie  for  obtaining  an  ordered  compatible  set  directly  from  the  ordered 
incompatible  table  is  given.  This  techniipie  generates  a  set  of  compatible 
pivots  with  the  property  of  generating  few  fills.  A  new  hueristic  algorithm  is 
theti  |)ro|)osed  that  combines  the  idea  of  an  ordered  compatible  set  with  a 
limited  l)inary  tree  search  to  generate  several  sets  of  compatible  pivots  in 
linear  time.  Finally,  an  elimination  set  to  reduce  the  matrix  is  selected. 
Parameters  are  suggested  to  obtain  a  balance  between  parallelism  and  fill-ins. 
Results  of  applying  the  proposed  algorithms  on  several  large  application 
matrices  are  presented  and  analyzed. 


^Rc'sfnrfh  wa«  ^iippnrtrd  in  part  by  NAS.\  Contract  No.  NASI-17070  and  by  the  Air  Force  Office  of 
Scientific  Rc'earch  tinder  Crant  No.  AFOSR  while  the  aiithors  were  in  residence  at  ICASE,  NASA 
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Introduction 

Soliilidii  of  n  liiu-nr  syslotn  of  ('qunlioiis  i*;  n-quircd  in  many  n|)|)li(  a<  ion 
programs.  One  such  area  is  tlic  \'LSI  circuit  simuiafion  programs.  I'"vcry 
compul (T-nidcd  circuit  analysis  program  includc's  a  routine  that  solves  a  sys¬ 
tem  of  sparse  linear  ecpiations.  If  itnplicit  inti'gration  is  used,  at  every  tiiim 
stej)  one  must  solve  a  system  of  nottlinear  ecpiations  (usually  by  Newton  itera¬ 
tion).  .At  every  iteration  a  system  of  linear  ecjuatinns  must  be  solved. 
Depending  on  the  intc'gration  method,  the  number  of  times  that  a  sparse  sys¬ 
tem  of  linc'ar  ecpiations  needs  to  be  scdved  may  be  large.  If  it  is  possible  to 
reclnce  the  solution  time  for  the  sparse  systcuii.  the  total  circuit  analysis  timi- 
would  be  significantly  reduced.  One  method  for  sr)lvitig  such  a  system  is  tin* 
factori/at  ion  of  the  matrix  into  lower  an'd  upper  triangular  matrices  followed 
by  forward  and  back  substitutions. 

One  promising  area  for  advances  in  solution  technirpie  is  the  use  of  paral¬ 
lel  computers  and  parallel  algorithms.  Our  previous  work  on  parallelizing  the 
M.\‘J8  [l]  sparse  matrix  package  for  the  IM>I’  [2]  multiprocessor  suggests  that 
sufficient  parallelism  is  not  obtainable  in  sparse  D/U  decomposition  without 
processing  multi|)le  pivots  in  parallel  (3].  Parallel  pivolirig  strategies  have 
been  investigated  by  (  alafian  ( f]  and  more  r<*cently  by  Wing  and  llaung  (r)]. 
[()].  less  pnd  Kees  [7]  and  Peters  [8].  .Although  the  number  of  operations  pos¬ 
sible  in  parallel  may  be  large  in  a  very  sparse-  systc-m,  exploit  at  imj  of  all  the 
available  parallelism  may  significantly  iffcrease  the  genc-rntion  of  fill-ins  (zero 
element  of  the  matrix  becoming  nonzero  as  a  result  of  elimination).  Since 
fill-iti  increases  the  total  corniiutation  work,  it  is  important  to  keep  the 
numbe-r  generated  under  control.  The  purpose  of  this  work  is  to  study  sparse 
P/P  decomposition  on  a  mult iprocess  t  by  means  of  an  algorithm  which 
exploits  parallel  pivots  and  keeps  fill  in  'ow.  The  class  of  sparse  systems 
guiding  the  study  will  be  those  arising  L  }  the  simulation  of  Vl^SI  circuit  . 
using  a  program  such  as  SPICE  (0). 


Wing  and  Haung  in  [5]  reirresent  the  triangulation  process  by  a  directed 
grajih  where  the  vertices  represent  a  divide  or  update  operation  (operations 
recpiired  for  performing  the  t riangulation ).  and  the  edges  determine  the  pre- 
cedc-nce  relation  of  the  operations  to  be  executed.  Py  assigning  level  numbers 
to  the  directed  grapli.  they  identify  all  operations  on  the  same  level  to  be- 
done  in  parallc-l.  Thc-y  use  a  weighted  comliinat  ion  of  fill-in  cost  and  depth  of 
coin|)Ut  at  ion  in  a  heuristic  to  determine  a  nearly  optimal  pivot  sequence. 
\N  bile  ing  and  llaung  identify  all  the  operations  that  can  be  done  in  paral¬ 
lel.  we  will  identify  all  jiivots  that  can  be  processed  in  parallel  at  each  step. 
An  issue  that  has  not  been  discussed  in  the  literature  is  that  in  a  sparse 
matrix  there  are  usually  difTerent  sets  of  possilile  pivot  candidates  for  each 
step,  and  the  sizes  of  these  sets  may  well  vary.  It  scorns  important  to  study 
these  possibilities  and  the  effect  of  parallel  pivoting  on  application  matrices. 
Algorithms  identifying  parallel  pivot  candidates  are  complex,  so  it  will  be  of 
value  to  come  up  with  such  algorithms  only  if  the  amount  of  parallelism  in 
circuit  domain  matrices  is  large  enough  to  justify  the  computation  required 
to  identifv  it. 


For 
•  r 


on/ 


Liiat 


Ity  Co'.lon 
-  an.i/cr 

'■.•0  I  til 


□  □ 


2 


1m  this  |)n|>('r.  uc  assiiinc  a  sijar(‘<l-incm<»r\ ,  MlMl)  model  for  onr  parallel 
eompul at  ion,  in  which  tlie  total  memory  address  space  is  accessihle  iiniformly 
to  all  parallel  units  (processes  or  individual  |>rocessors).  This  computational 
model  should  provide  synchronization  mechanisms  to  allow  multiple  memory 
updates  .  If  multiple  uf)datos  are  aimed  at  the  same  memory  cell,  the  p('nalty 
paid  is  a  short  delay  in  access  time.  Ilased  on  this  computational  model,  the 
first  half  of  this  paper  is  devoted  to  study  the  anjount  of  |)arallelism  that 
exists  in  application  matrices.  This  is  carried  out  hy  |>roducinK  all  |>ossihle 
sets  of  pivot  eandidal('s  which  can  be  processed  in  parallel  at  each  step  for  a 
number  (»f  small  matrices.  Observations  are  then  made  on  different  strategies 
for  choosing  one  of  the  sets  produced  at  each  stej).  and  hence  the  generation 
of  fill-ins  and  possible  parallel  pivoting  steps.  The  complete  and  detailed 
analysis  of  this  study  leads  us  into  the  second  half  of  the  paper,  where  we 
describe  a  fast  heuristic  algorithm  to  produce  a  set  of  acceptable  |)arallel 
|)ivot  candidates  for  reducing  the  matrix  at  each  step.  Issues  involved  in 
balancing  parallel  work  and  fill-in  generation  are  discussed  and  verified 
through  siiiiulaled  results. 

Parallel  Pivot  Candidates 

'I'he 't  riangulat  ion  method  used  here  as  mentioneci  above  will  be  sj^arse 
171  decomposition.  I’or  simplicity,  we  only  consid(>r  the  diagotial  elements  of 
the  matrix  as  pivot  candidates.  Note  (hat  pivoting  usually  refers  to  unsym- 
metric  permutations  of  the  matrix  for  swapping  an  off-diagonal  matrix  (de¬ 
ment  with  a  diagonal  element.  In  this  paper,  we  are  only  considering  sym¬ 
metric  permutations  of  the  matrix.  Even  though  we  are  not  pivoting  in  the 
above  sense,  the  temis  pivot  and  pivoting  are  used  throughout  the  paper  to 
refer  to  the  diagonal  element  used  to  reduce  the  matrix  at  a  given  step  and  a 
symmetric  permutation  respectively. 

In  a  sparse  matrix,  two  pivots  a.,  and  a.,  can  be  processed  in  parallel  if 
a.j  and  (i-  are  both  zero.  In  other  words,  t/uring  elimination,  row  j  is  not 
involved  in  the  elimination  process  taking  place  for  pivot  a..,  and  row  i  is  not 
involved  in  the  process  for  rtyy.  This  statement  can  only  be  true  if  we  provide 
correct  synchronizat i«»ns  for  simultaneous  update  during  the  elimination  with 
parallel  pivot  candidates: 

1.  During  elimination,  when  processing  pivots  in  parallel,  it  is  pos¬ 

sible  that  an  element  of  a  nonpivot  row-  needs  to  be  updated  by  all  or 
some  of  the  parallel  processes  handling  pivots  i,j,...  for  the  current  step. 
In  order  for  each  process  to  obtain  a  completely  updated  value,  as  a 
result  of  a  previous  update,  the  update  operation  must  be  done  asyn¬ 
chronously  by  parallel  prc^cc.sse.s.  On  the  other  hand,  the  order  in  which 
parallel  processes  update  an  element  is  of  no  importance  (except  for 
round  off  errors). 

2.  During  elimination,  when  processing  pivots  in  parallel,  it  is  pos¬ 

sible  that  a  fill-in  is  generated  in  position  (m,ni.  It  is  also  possible  that 
more  than  one  jirocess  tries  to  generate  a  fill-in  in  the  same  position 
(>M,n).  The  position  (m,n)  for  the  fill-in  must  be  created  once  by  one 
process  only,  and  other  processes  will  update  its  value  as  in  1. 
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If  luo  pivots  nnd  cnn  bo  processed  in  pnrallol,  and  if  a.,  and 
can  nl'^o  bo  prtxossod  in  parallol,  then  a..,njj,  and  cannot  necessarily  be 
processed  in  |)arallel.  The  ridation  between  parallel  pivot  candidates  is 
rrpeilve  and  sytnwetric,  but  not  transitive,  and  is  thus  a  cntnpntibitit y  rela¬ 
tion.  'I’wo  pivots  related  in  this  way  will  sitnjdy  be  said  to  be  eonipatible  in 
what  follows.  A  conse(p!ence  of  (he  nont  ransil  ivit  y  of  the  compatibility  rela¬ 
tion  is  that  it  classifies  the  elements  of  a  set  into  nonrlisjoint  siiljsels.  so  that 
all  members  of  a  subset  are  compatible.  These  stibsets  are  called  compatibil¬ 
ity  classes.  'I'hns,  in  order  to  come  up  with  all  |)os.sible  sets  of  pivots  that  can 
be  processed  in  parallel  and  are  of  maxinuim  size,  wo  need  to  find  all  ma\imal 
com|)at ibies.  A  maximal  compatible  is  a  compatible  that  is  not  included  in 
any  larger  compatible. 

To  clarify  the  discussion,  we  define  a  boolean  matrix  B  for  each  sparse 
mat rix  A,  such  that ; 

6  .  =  1  iff  a  .  *  0 
b.  =  0  otherwise 

'j 

where  b-.  and  o.j  denote  elements  of  R  and  A  respectively. 

Several  ai>proaches  for  constructing  the  set  of  maximal  compatibles  exist, 
and  they  are  all  based  on  const  met  ion  of  an  incompatible  table  (10].  The 
incom|)a(  ible  table  specifies  pairs  of  incompatible  elements.  Assume  pivots 
are  taken  from  (he  diagonal  elements  of  the  sparse  matrix  and  are  numbered 
1  throiigh  n  corresponding  to  diagonal  elements  of  rows  1  through  n.  Now  we 
could  represent  the  incompatible  table  as  a  table  consisting  of  (n-1)  columns, 
where  each  column  i  has  (n-i)  elements.  Columns  of  the  (able  corrr'spond  to 
pivot  clenjcnts  of  the  matrix.  C'olumn  one  of  the  table,  corres|M)nding  to 
pivot  number  one,  is  set  to  the  bit  vector  resulting  from  oring  row  and 
column  one  of  the  matrix  B  and  keeping  the  last  (n-i)  elements.  1'he  same 
process  is  repeated  for  pivot  2  (column  2  of  the  table),  for  the  submatrix 
obtained  from  the  original  matrix  with  row  and  cobimn  one  eliminated.  For 
every  C(dumn  of  the  table  that  is  completely  constructed,  the  corresi)onding 
row/column  of  the  matrix  is  eliminated.  The  ])rocess  is  repeated  for  all  pivots 
in  c)rder.  It  is  important  to  note  that  the  incompatible  table  is  constructed  for 
a  given  ordering  of  the  sparse  matrix.  'Phus,  there  are  n!  dilTerent  incompati¬ 
ble  tables  for  n!  possible  diagonal  orderings  of  an  n  by  n  sparse  matrix.  In 
what  follows,  we  rei)resent  the  incompatible  table  as  an  array  of  dimension  n, 
say  iwptbl{n),  with  elements  of  the  array  being  sets  of  at  most  n  elements 
each.  I'ach  set  corres])onds  to  a  column  of  the  table.  As  an  illustrative  exam¬ 
ple.  the  incompatible  table  for  (he  niatrix  AI  of  Fig.  1.1. a  is  given  in  Fig. 
Mb 

The  maximal  compatibles  are  found  by  combining  the  pivot-pairs  from 
the  incompatible  table  into  larger  groups  with  compatible  elements.  Several 
systematic  approaches  for  extracting  (he  max'imal  compatibles  have  been  sug¬ 
gested.  and  they  all  use  an  exhaustive  search  routine.  The  one  approach  that 
seems  to  be  more  suitable  for  programming  on  a  digital  computer  is  one  that 
assn  me.*!- initially  that  all  pivot  candidates  can  be  grouped  into  one  set.  Then 


4 


Mnlrix  Al 
Fig.  1.1. a 
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Inrompnl ihle  Table 
Fig.  1 . 1  .b 

fhe  iiiforinal ion  from  (lie  incompatible  table  i.s  used  for  contradictions  and 
splilting  (lie  groups  where  necessary.  This  procedure  involves  searching  a 
binary  (ree.  Ini(ially.  it  is  assumed  that  all  pivots  are  comjiatible.  They  are 
grouped  in  one  set  consisting  of  all  pivot  elements.  This  set  will  be  at  the 
root  of  a  binary  tree,  level  zero.  Next,  the  set  of  pivots  incompatible  with 
pivot  number  one,  obtained  from  the  incompatible  table,  is  used  to  split  the 
set  at  the  root  into  a  left  and  a  right  sot,  constituting  level  one.  The  left  set 
consists  of  all  elements  of  its  parent  set  at  level  zero,  except  those  incompati¬ 
ble  with  pivot  one.  The  right  set  consists  of  the  same  elements  as  the  start¬ 
ing  set  (parent  set),  except  pivot  one  itself.  At  the  next  step,  the  incompati¬ 
ble  information  for  pivot  number  two,  is  used  to  break  each  set  at  level  one 
into  a  left  and  right  set  for  level  2.  Furthermore,  since  the  matrix  is  sparse, 
some  of  (he  sets  at  a  given  level  will  not  split  into  smaller  sets  for  some  pivot¬ 
ing  elements,  but  they  may  still  consist  of  incom|)at ible  elements  and  will 
split  for  some  later  pivots.  Consequently,  the  binary  tree  corresponding  to 
this  search  .will  not  always  be  a  dense  tree.  This  process  is  repeated  until  no 
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niorr  si)Iifling  of  tlio  sots  is  possible.  'I'ho  Icnf  sols  arc  tlion  (hooked  and 
every  set  included  in  a  larger  leaf  set  is  eliminated.  The  remaining  sets  eon- 
stitiite  all  possible  maximal  compatibles.  Note  that  the  length  of  <■  path  from 
the  root  to  a  leaf  oonld  be  at  most  n. 

The  above  process  is  shown  for  the  example  matrix  of  Fig.  1.1  in  I’ig  1.2. 
Initially,  pivots  number  1  tl<ro\igh  7  are  grouped  together  as  the  starting  set. 
Column  one  of  the  incompatible  table  indicates  that  pivot  S  is  incompatible 
with  pivot  one.  Thus  the  starting  set  is  split  into  two  sets  ( 1 ,2,3, -1.0, 7)  and 
(2,3.  l,r),G,7).  At  the  next  level,  these  two  sets  are  broken  into  four  sets,  each 
using  the  incom|)nt ibilit y  information  for  pivot  number  two  from  the  table. 
This  process  is  continued  until  no  more  splits  are  i>ossil)le.  At  the  end,  the 


1,2,3,4,5,0,71 


1T5,61  [4,6,7] 
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Binary  Tree  Search  to  Obtain 
the  Set  of  Maximal  Compatibles 


Fig  1.2 
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extra  sets  (3,1,7)  and  (4,0,7)  wliicli  are  included  in  llie  maximal  sets  (!>)  and 
(c)  resi)eet ively,  are  eliminated.  The  remaining  five  sets  are  the  maximal 
compat  ihles. 

A  high  level  description  of  the  above  procedure  is  given  below: 

j)rocedure  M AX(X)MI‘(sset,i) 

Assumpt  ions; 

-  |)ivc)t  candidates  are  numbered  from  1  to  n. 

-  initially  sset  consists  of  all  pivots  in  the  matrix  and 

i  is  the  first  pivot. 

w  hile  i<  n  do 

begin 

(’split  sset  into  left  and  right  sets*) 

Iset  =  sset  -  impt bl[i] 
rset  =  sset  -  [i] 

if  (Iset  not  a  comjiatible  set)  then 
max(  omp(lset  ,i+ 1 ) 
if(rset  not  a  comiiatible  set)  then 
maxcomp(rsct  ,i+ 1 ) 

end 

In  the  above  procedure,  many  branches  do  not  need  to  be  continued  to  the 
completion  of  the  search,  since  they  are  included  in  other  subtrees.  More¬ 
over.  as  will  be  des(iibed  later,  we  only  need  to  produce  compatible  sets  of 
maximum  size.  'I’lnis,  there  are  many  branches  in  this  tree  that  could  be 
trimmed  to  limit  the  amount  of  search.  ICven  including  these  features,  thi^ 
algorithm  has  exponential  <  ()mi)le\ity ,  and  only  serves  to  obtain  iiifi>rmat  ion 
about  sparse  mat  rices. 

T(i  stu<ly  the  issues  discussed  earlier,  a  l‘.\S('AL  j)rogram  was  written  to 
perform  symbolic  L/l'  decomijosit  ion  on  a  sjiarse  matrix.  Our  objective  was 
to  study  the  elfects  of  parallel  jiivoting  so  the  jirogram  performs  the  de(()m- 
position  to  the  last  parallel  step  and  does  not  continue  if  parallel  pivot  l  andi- 
dates  are  not  availalile.  The  structure  of  the  program  is  outlinetl  below; 

program  PlX’O'I'.Sl/r 

-  Head  in  input  matrix  and  construct  matrix  structure. 

-Construct  all  maximal  compatibles. 

-if  parallel  pivoting  is  not  possible  go  to  stop 
-Pick  a  set  of  compatible  pivots  to  be  processed 
in  i)arallel. 

-I’ermute  the  matrix  according  to  the  parallel  pivots  for  this  step, 
-reduce  the  matrix  and  insert  the  resultant  fill-ins. 

-Repeat . 

-Stop. 
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Analysis  Performed 

111  genera),  in  matrices  arising  from  circuits  tlieri  are  many  tlilfercnt  sets 
of  compatible  pivots  of  ecpial  maximum  size.  Depending  on  liow  a  set  is 
cliosen  to  reduce  tlie  matrix  at  eacli  step,  we  oljtaiii  a  different  beliavior  in 
generation  of  fill-in  eli'inents,  and  as  a  result,  different  jiossibilit ies  for  con¬ 
tinuing  jiarallel  pivoting  in  the  next  ste|)s.  The  issues  of  generation  of  fill-ins 
and  parallelism  in  pivoting  have  been  studied.  We  used  different  strategies 
to  select  a  set  of  compatible  pivots  and  then  obtained  statistical  information 
from  some  circuit  matrices  generated  from  the  S1M(T>  circuit  simulation  pro¬ 
gram. 

The  Markowitz  criterion  (ll|  is  well  known  for  minimizing  the  generation 
of  fill-ins  in  sparse  matrices  in  sequential  pr<»gramming.  It  is  based  on  the 
fact  that  at  stej)  k,  the  maximum  number  of  fill-ins  generated  by  choosing  a^. 
as  j)ivot  is  (r.- l)(f 1)  .  Here  r“l  is  the  number  of  nonzero  elemeii  I  s  ot  her 
than  (1-  in  the  i-ih  row  of  the  reduced  matrix,  and  fy"!  Hie  number  of 
nonzero  elements  other  than  n-j  in  column  j  of  the  reduced  matrix.  Mar¬ 
kowitz  selects  as  pivot  element  at  step  k,  the  element  which  minimizes 
( r. -  1 )(  f  -  1 ).  The  product  (  r-  1 )( fy  -  1 )  is  the  Markowit  z  number  of  element 
a  .  .  In  what  follows,  we  use  the  Markov.itz  idea  as  a  basis  for  tlw  selecti(jn  of 
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a  compat  ibie  pivot  set . 

In  our  first  analy^i.s  we  compare  t\\<t  <IUh‘rvnt  strategies  for  choosing  a  set 
of  com|)at  ible  pivdts  among  all  maximal  compatibles.  In  both  cases  we  (  (in¬ 
sider  only  the  sets  (.f  maximum  size.  'I'he  first  strategy  (calh'd  Markowitz 
slim)  chooses  that  set  among  all  sets  of  maximum  size  in  which  the  sum  of  the 
Markowitz  niimliers  of  all  its  elements  is  miiiiiiium.  'Die  jiroblem  h(‘re  is  tliat 
some  of  the  pivdts  in  the  set  chosen  for  reducing  the  matrix  may  gene/'ale 
fill-ins  in  the  same  jiosit  i(jns,  and  thus  we  overestimate  the  Markowitz,  count 
for  a  ])urely  sequential  case.  .As  an  alternative,  a  second  strategy  is  employed 
(called  Ored  Markowitz).  Here,  using  the  boolean  matrix  H  corresponding  to 
the  sjiarse  matrix  under  consideration,  we  count  number  of  nonzeros  in  a  vec¬ 
tor  that  is  the  result  of  Oliing  rows  of  pivot  (andidates  in  the  set  and  multi¬ 
ply  this  number  by  the  number  of  non-zeros  in  a  vector  resulting  from  Olfing 
columns  of  the  jioteiit  ial  pivots. 

Comparison  of  the  above  strategies  on  our  test  cases  shows  that  the  first 
method  is  almost  always  superior.  Our  results  show  that,  in  general,  by 
minimizing  the  Markowitz  sum  we  always  get  fewer  fill-ins  generated  and 
often  more  rows  are  reduced  in  parallel  steps.  This  study  has  shown  that  the 
amount  of  parallelism  in  circuit  matrices  is  quite  high  but  that  the  generation 
of  fill-in  terms  is  also  (piite  high  in  most  cases  when  compared  to  the  sequen¬ 
tial  runs  on  the  same  matrices.  The  number  of  potential  pivots  to  be  pro¬ 
cessed  in  parallel  at  each  step  seems  to  be  so  high  that  we  could  jirocess  fewer 
pivots  in  parallel  in  a  step  without  limiting  the  parallel  work  considerably. 
An  experiment  to  study  this  possibility  is  performed  by  picking  the  maximum 
sized  set  with  minimum  Markowitz  sum  as  was  explained  above.  This  set  is 
then  used  to  reduce  the  matrix,  with  the  following  analysis  performed  on  the 
set  of  compatible  |)ivots. 
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-I)i>c:>rd  llie  pivot  with  maxiiiiuin  Markt)witz  count  and  determine 

niimher  of  fill-ins  that  would  be  generated  as  a  result. 

-Hepeat  the  above  jirocedure  until  no  more  pivots  can  be  discarded 

from  the  set,  either  because  the  set  size  is  too  small  or  because 

all  Markowitz  sums  are  zero. 

Although  the  above  analysis  of  reducing  the  size  of  the  set  of  compatible 
pivots  was  done  for  each  stej),  the  actual  elimination  and  fill  for  a  step  was 
done  using  the  maximal  compatible  with  lowest  Markowitz  sum.  'I'liis 
analysis  is  repeat  id  at  each  parallel  step  and  the  results  show  that  it  is  possi¬ 
ble  to  decrease  the  generation  of  fill-ins  at  this  ste|)  significantly  by  n-diicing 
till-  .uiiount  of  j)arallel  work  slightly.  In  fact,  discarding  only  one  compatible 
l)ivot  results  in  a  decrease  of  at  least  about  one  third  in  the  number  of  lill-ins 
that  Would  be  general!  I  otherwise. 

W  e  performed  this  analysis  over  all  generated  sets  of  compatible  |)ivots 
also,  in  this  experiment,  we  chose  maximum  sized  set  with  minimum  .Mar¬ 
kowitz  sum  and  used  it  for  reducing  the  matrix  as  described  below; 

-  for  all  sets  of  maximum  size  do 

lind  the  pivot  with  maximum  Markowitz  count  and 
remove  from  t  he  set . 

-  lind  the  set  of  maximum  size  and  minimum  Markowitz  sum 
and  dete  rmine  numl>er  of  fill-ins  that  would  be  gcuu-rated 
from  t  he  processing  of  i  his  s<>t . 

-  repeat  (  he  abov  e  (troces', 

.Similar  results  were  (!blaine<l  by  .'ipplying  the  above  two  procedures  to  our 
lest  matrices.  Hence,  we  will  use  the  first  method  for  the  next  phase.  'I'li.il  is. 
the  next  analysis  is  performed  on  the  set  of  maximum  size  and  minimum  .Mar- 
kow it z  sum. 

Kven  though  the  above  experiment  shows  we  can  always  gem-rale  fewer 
fill-ins  at  a  step  by  avoiding  the  maximum  possible  parallelism,  it  does  not 
indicate  that  this  will  not  vlelay  t  he  general  ion  of  fill-ins  to  lal  er  st  e|>s.  In  our 
next  experiment,  we  choost-  (h«-  maximum  sized  set  with  minimum  .M.irkowilz 
sum,  but  this  tiim-  we  discard  the  pivot  with  maximum  Markowitz  (ouni 
from  the  set  and  use  the  resulting  set  for  elimination  and  fill  generation.  W'e 
will  also  rr  peal  the  pr»‘vious  analysis  by  reducing  the  set  size  and  determining 
number  of  resulting  fill-ins.  'J'his  work  confirms  our  previous  result  that  by 
ilis(  aiding  some  of  the  parallel  pivot  candidates  according  to  their  high  .Mar¬ 
kowitz  (dunl  we  decrease  the  t«ilal  generation  of  fill-ins. 

Results  of  Complete  Analyses 

A  set  of  circuits  to  be  simulated  by  the  SlMC'li  circuit  simulation  pro¬ 
gram  is  available  as  a  benchmark  to  test  Sl’l(■l'^  W'e  used  these  circuits  as 
input  to  SPICE  and  generated  their  corresponding  matrices.  These  matrices 
are  used  as  test  cases  for  analysis  purposes.  The  first  circuit  is  a  simple 
dilfercntial  pair  and  generates  a  IG  by  16  matrix  with  57  nonzeros.  The 
matrices  are  of  small  sizes  and  the  size  range  is  between  12  by  12  to  21  by  21. 
The  complexity  of  our  algorithm  to  generate  all  possible  maximal  sets  of  com¬ 
patible  pivots  would  not  allow  us  to  test  larger  matrices,  but  the  generated 
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iiiforiiiat  ion  produces  valuable  statistics  about  parallelism  and  circuit 
matrices.  An  algorithm  with  tolerable  com|)lexity  to  jiroduce  a  set  of  compa¬ 
tible  pivots  will  involve  heuristics;  tlierefore,  it  will  not  give  total  informa¬ 
tion  about  the  mat  rix. 

^J'he  results  of  comparison  of  Markowitz  sum  and  Ored  Markowitz  stra¬ 
tegies  are  si..nmarized  in  Table  1.1  (tables  are  provided  in  aj)pendix  A  at  the 
end  of  this  paper).  I'he  first  column  describes  the  circuit,  the  ordei  of  thc' 
matrix,  and  number  of  nonzeros.  The  second  column  indicates  the  parallel 
pivoting  ste|).  Columns  3  to  5  corres|)ond  to  the  Markowitz  sum  strategy 
described  earlier,  and  columns  b  through  8  correspond  to  Ored  Markowitz, 
'file  first  column  for  each  algorithm  is  the  size  of  the  maximum  set  of  jiivots 
obtained  at  a  stej),  the  second  column  is  the  minimum  operaticm  count 
obtained  for  such  a  set,  and  the  last  column  specifies  the  number  of  fill-ins 
that  are  generated  as  the  result  of  jirocessing  the  indicated  set.  Column  0 
indicates  the  total  number  of  maximal  eom|)atibles  generated  at  each  stej). 
'i'lie  last  two  columns  are  information  generated  by  the  SPICIC  jirogram 
about  the  amount  of  fill-in  generated  and  the  percentage  of  the  matrix  whic  h 
is  zero. 

,\s  can  be  seen  from  the  table,  in  every  case  the  second  strategy  resulted 
ill  e(|ual  or  more  fill-ins  and  eciual  or  fewer  paralhd  step-;  with  fc-wer  number 
of  rows  rc-duced.  'I'his  indicates  that  the  Markowitz  sum  is  a  better  hc-uristic 
for  selc-cting  the  set  of  pivots  among  many  sets,  'fliis  can  be  obser\ c'd  from 
the-  Hi  by  Hi  matrix  of  the  diffc-reiit  iai  j»air  circuit.  In  th(‘  first  step,  with  sets 
of  size  six.  Markowitz  sum  generated  b  fill-ins  while  Ored  Markowitz  gen¬ 
erated  8.  The  pivot  set  chosen  bv  the  Markowitz  sum  generated  fewer  fill-ins 
than  the  Ored  Markowitz  algorithm,  and,  as  can  be  seen,  the  Ored  Markowitz 
resulted  in  twice  as  many  fill-ins  as  the  Markowitz  sum  and  f(>wer  pivots  \fere 
jirocessed  in  parallel  (1-1  for  Ored  Markowitz  and  15  for  Markowitz  sum). 
I'lie  same  behavior  resulted  from  the  fX'P  compatible  SCNH'r'I'  trigge-r  (ire  nit 
whic  h  jiroduced  an  18  by  18  matrix.  The  nuiiiicer  of  fill-ins  at  step  2  of  paral¬ 
lel  t  riangiilat  ion  is  10  for  Ored  Markowitz  and  only  1  for  the  Markowitz  sum 
with  none  being  gemerated  in  the  next  steps.  Ored  Markowitz  generated  I 
more  fill-ins  at  step  3  and  was  not  able  to  find  any  more  parallel  pivot  catidi- 
dates,  but  the  first  strategy  continued  to  do  one  more  partillel  step.  Of 
(oiirse,  there  are  cases  where  both  strategies  produced  similar  or  close*  rc*sults, 
as  can  be  seen  from  the  t;ible.  'Fite  table  also  indicates  that,  in  parallc-l  rutis. 
geiieraticm  of  fill-itis  is  much  higher  than  in  secpieiitial  runs  of  the  SIMCIO  pro¬ 
gram.  .At  the  satne  time  it  can  be  seen  that  the  matrices  generally  do  not 
become  dense  rapidly,  and  parallel  pivot  candidates  are  available  to  almost 
the  very  last  stei)s  of  the  triangulation  process. 

'File  result  of  our  next  analysis  is  shown  in  Table  1.2.  At  each  step,  a  set 
of  maximum  size  and  minimum  Markowitz  sum  is  selected  to  reduce  the 
matrix.  Fuilhermore,  from  this  set  we  repeatedly  remove  a  pivot  with  max¬ 
imum  Markowitz  count  and  compute  the  number  of  fill-ins  that  would  be 
generated  if  this  set  were  used  to  reduce  the  matriy.  As  can  be  seen  from  the 
table,  ill  every  case  it  is  possible  to  reduce  number  of  fill-ins  significantly  by 
reducing  the  amount  of  parallelism  slightly.  For  example,  for  the  lb  by  lb 
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innirix  with  •'jT  iK)ii7crns,  we  cnn  scp  (hnl  if  wo  rodiico  tlio  nninhor  of  coniijot  i- 
l)lc  pivots  from  G  to  1  by  roinoviiig  the  two  pivots  with  highest  Mritkowilz 
count  froni  the  sot.  wo  cnn  i)rovont  generation  of  more  fill-ins.  Also  in  the 
In'^t  21  by  21  matrix  with  158  nonzoros,  we  can  reduce  the  nninl)er  of  gen¬ 
erated  fill-ins  by  a  factor  of  2  (from  10  to  20),  if  we  discard  two  pivots  in  step 
one  in  the  same  fashion.  This  is  a  general  result  that  can  be  observed  from 
the  table  for  all  cases  and  all  parallel  steps. 

In  the  iK'xt  experiment  we  confirm  that  it  is  possible  to  reduce  the  total 
generation  of  fill-ins.  as  opposed  to  just  at  each  step  .  by  using  fewer  than  the 
maximum  number  of  coinj^atible  pivots.  In  every  case  we  have  been  alile  to 
rediKe  the  total  number  of  fill-ins  by  some  fraction  (at  least  about  one  tliirrl). 
compared  to  the  case  where  maximum  parallelism  was  employed  .  'I'heso 
results  are  summarized  in  Table  1.3.  Here  we  chose  to  discard  a  jiivot  from 
the  maximal  compatible  set  according  to  its  highest  Markowitz  count.  If  the 
maximal  set  would  not  generate  any  fill-ins,  because  of  a  zero  Markowitz  sum. 
we  did  not  diveard  any  pivots  from  the  set.  The  total  number  of  fill-ins  gen¬ 
era  tc'd  for  till'  first  matrix  (16  by  16)  is  2  which  is  one  third  of  the  .amount 
generati'd  with  our  first  exjieriment  (6).  This  number  was  reduced  from  10  to 
26  for  the  case  of  the  21  by  21  matrix  with  151  nonzeros.  In  this  case  the 
number  yf  parallel  steps  was  increased  from  5  to  6.  but  the  total  number  of 
rows  that  could  be  reduced  in  these  steps  remaincal  constant.  In  fact,  in  most 
cases,  the  number  c>f  p.iralle)  steps  is  increased,  but  the  total  number  cif 
pivots  that  could  be  processed  in  lhes<>  steps  does  not  change*  much  (no 
change  is  greater  than  one  addition  or  rc'duction  in  the  number  of  rc'clucecl 
rows). 


Generation  of  Compatible  Sets  from  the  Incompatible  Table 

It  is  clear  that  in  large  sparse  circuit  matrices  the  number  of  possible* 
pivots  to  he  processc'cl  at  each  step  will  be  much  higher  than  our  smal’  exam¬ 
ple  matrices,  and  therefore,  it  will  be  possible  to  obtain  enough  parallel  work 
by  just  considering  a  sub-maximal  set  of  compatible  pivots  at  each  step.  I'he 
algorithm  clescribed  involves  a  complete  binary  tree  search  and  has  exponen¬ 
tial  ((  ;  ji*xi-y  in  the  order,  n.  of  the  sparse  matrix.  In  order  to  come  iij) 

with  a  od  heuristic,  we  need  to  relax  the  recpiirement  of  finding  the  maxi¬ 
mal  *  ..I  compatible  pivots  with  minimum  Markowitz  sum.  As  a  conclusion 

from  tii:  ii  we  analysis,  we  will  have  to  reduce  the  size  of  the  set  to  decrease 
the  gene,  -;ion  of  fill-ins.  Keeping  the.se  problems  in  mind,  an  acceptable  set 
woul  !  be  one  which  has  a  large  number  of  pivot  candidates  for  parallel  pro¬ 
cessing  and  a  low  enough  Markowitz  sum.  We  now  need  to  look  for  a  jiro- 
ceclure  which  lends  to  produce  a  number  of  compatible  sets  of  reasonably 
large  size  and  low  Markowitz  sum.  Having  generated  such  sets,  we  can  then 
choose  the  best  candidate  among  these  compatible  sets  using  the  same  cri¬ 
teria  as  before.  In  what  follows,  we  will  describe  different  issues  which  will 
leafi  us  to  a  good  heuristic  algorithm  and  a  set  of  parameters  to  be  used  in 
trading  off  between  fiU-in  generation  and  the  size  of  the  set  of  parallel  jiivot 
candidates. 
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So  far.  the  information  from  the  iiirompat  ihio  table  has  been  nsed  to 
constrnct  the  maximal  compatible  sets  *»f  i»ivols  in  n  complete  binary  tree 
search  algorithm.  A  more  careful  analysis  of  the  incompatible  table  could 
provide  a  set  of  compatible  pivots  without  the  need  for  searching  the  tree. 
As  we  know,  this  table  gives  information  about  the  incon)j)at  ibie  pairs  of 
pivots.  In  other  words,  by  looking  at  column  i  of  the  table  corresponding  to 
pivot  i,  we  obtain  all  pivot  ntimbers  j>i  where  pivot  j  is  incompatible  with 
pivot  i.  for  a  given  ordering  of  the  matrix.  Note  that  we  are  assuming  pivf)ts 
are  taken  from  the  diagrinal  of  the  matrix  and  they  are  numbered  1  through  n 
corresponding  to  ro^^s  1  through  n  of  the  matrix.  Conserjuently ,  if  column  i 
of  the  table  is  null,  then  the  corresponding  pivot  inimber  i  is  compatibb*  with 
every  pivot  whose  corresponding  column  lir’s  to  the  right  of  column  i.  Hence, 
bv  scanning  the  incompatible  table,  we  cafi  find  a  set  of  compatible  pivots 
whose  corresponding  columns  in  the  table  are  null.  Clearly,  pivots  with  such 
a  property  are  comi)atible  and  can  be  grouped  in  a  conijiatible  set.  Csiiig  the 
repn'sen  I  at  ion  of  the  incompatible  table  described  earlier,  the  above  pro¬ 
cedure  can  he  formul.it cd  as: 

scan  imptbl  from  right  to  left 
for  each  column  i  of  implbl  do 
,  if  (  impibi  is  empty)  t  hen 

(*add  the  eorrespond ing  pivot  to  the  set  of  compatibles*) 
coiiipfict  =  cotupftel  4-  (i) 

where  conipaet  is  the  set  of  compatible  pivots  whose  corresponding  c(dumns 
in  the  table  are  null.  Now  if  there  exists  a  pivot  k  such  that  the  set  of  pivots 
incompatible  with  it  in  column  k  of  the  table,  is  disjoint  from  the  set  of 
alnad)  constructed  compatible  pivots  in  cowpsrt  ,  then  k  is  compatible  with 
every  pivot  in  cninpf^rj  .  I'herefore,  we  can  expand  rompxet  by  adding  k  to  it. 
'I'lie  above  procedure  can  now  be  completely  described  as; 

scan  itnplbl  from  right  to  left 
for  each  c(dumn  i  of  iivptbl  do 
begin 

if  (  impibl.  Q  cowp.^el  i-  empty)  then 
(*add  [i]  to  the  set  of  compatibles*) 
covipset  -  cninpfiet  4-  [i] 
else 

delete  row  f  of  iivptbl 
end 

The  compatible  set,  conipset  ,  produced  by  this  procedure,  will  be  referred  to 
as  an  ordered  compatible  set  from  now  on,  since  it  is  obtained  by  imposing  a 
specific  ordering  on  the  diagonal  elements  of  the  matrix  to  get  the  incompati¬ 
ble  table.  .As  an  example,  the  incompatible  table  of  matrix  A2  in  Fig.  2.1. a  is 
given  in  Fig.  2.1.b.  The  compatible  set  corresponding  to  the  null  columns  of 
the  table  consists  of  pivots  10  and  11.  This  set  consists  of  2,  10,  and  11  after 
the  above  ex|)ansion. 

As  was  explained  previously,  our  strategy  for  selecting  a  comj)atible  set 
among  all  possible  cr)mpatible  sets  of  equal  maximum  size  was  to  select  the 
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^^atrix  A2 
Fig.  2. 1. a 
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Fig.  2.1.b 

one  with  minimuni  Markowitz  sum.  That  is,  to  sole-  f  the  set  in  which  the 
sum  f)r  Markowitz  numbers  of  the  pivots  in  its  set  is  minimtim.  If  we  consider 
the  set  of  contpatible  pivots  constructed  above  directly  from  the  incompatible 
table,  we  see  that  it  consists  of  pivots  2,  10,  and  11,  which  in  turn  have  Mar¬ 
kowitz  numbers  0.  4.  and  12.  In  general,  we  would  like  to  have  a  compatible 
set  cf)nsisting  of  pivots  with  as  low  Markowitz  numbers  as  possible.  It  is  also 
clear  that  pivots  with  low  Markowitz  numbers  generally  have  fewer  incompa¬ 
tibilities.  Moreover,  by  looking  at  the  incompatible  table  of  Fig.  2.1  .b,  we  see 
the  compatibje  pivots  10,  11  are  obtained  from  the  right  end  portion  of  the 
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fnhlc.  This  is  iisiinlly  tlio  cnso,  since  ns  \ve  ccMisIrucl  coliiinns  of  llir  iiK  oiiipn- 
fihli'  Inhlo.  uo  nre  lefl  with  n  smnller  snhmntrix  to  work  with,  'rinis,  nfter 
(•om|)lct  ing  cnch  column,  we  hnve  fewer  incompnt  ihles  lefl  for  the  const  ruc¬ 
tion  of  the  next  column.  'Fhese  ohservnt irms  lend  ns  to  use  a  different  order¬ 
ing  in  which  the  first  cf)lnmn  of  the  incompnt ihle  tnhle  has  the  maximum 
number  of  incompatibles  and  ns  we  work  our  way  to  the  right  end  of  the 
table,  the  number  of  incom|)at ibles  will  decrease  to  the  minimiim.  Such  an 
ordering  implies  the  resulting  incompatible  table  will  have  more  null  c(  luinns 
clustered  at  the  right  end.  So  the  ordered  com|)atiblc  set  that  caii  be  con¬ 
structed  from  the  ordc'red  table  will  be  of  a  larger  size  and  smaller  Markowitz 
sum  than  the  results  of  the  above  procedtire.  As  a  result  of  these  arguments, 
we  sort  the  pivots  in  order  of  decreasing  Markowitz  numbers.  Using  this  new- 
ordering.  we  can  construct  a  new  incompatible  table  with  the  first  column 
corri'sponding  to  the  pivot  with  highest  Markowitz  number  and  the  last 
column  corresponding  to  the  pivot  with  lowest  Markowitz  number.  As  an 
example,  the  Markowitz  numbers  and  the  new  ordering  of  the  pivots  are 
slu>wn  in  Fib.  2.‘J.a  for  matrix  .•\2  of  Fig.  2.1. a.  The  corresponding  ordered 
incf)mpat ible  table  is  given  in  Fig.  2.2.b.  It  can  be  seen  from  Fig.  2.2. b  that 
the  cfillection  of  pivots  corresponding  to  null  columns  of  the  table  gives  a 
compatible  set  of  size  1  and  Markowitz  sum  1  cmisisting  of  pivots  1,  2.  d.  and 
I.  d'liis  IS  in  cmnparison  with  set  of  size  2  and  .Markowitz  sum  1(5  g<>neralerl 
from  the  unordered  ‘ncomiiat ible  table  of  Fig.  2.1.b.  .After  expanding  this 
set .  w  e  prod u(  e  a  compat  ibie  set  of  size  and  Markow  it  z  sum  16  consist  in g  of 
pivots  1 .  2.  d.  1.  and  f). 


Limited  Binary  Search  Tree 

In  this  section,  we  will  combine  the  i<lea  of  an  ordered  compatible  set 
with  the  tree  search  algorithm  described  earlier  to  obtain  a  limited  tree 
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sc.ircli  iilgoritlim  wliicli  produces  nii  .•ucepl :il)|e  set  of  compn(il)Ie  pivots  for 
reducing  l)ie  m.ilri.v.  Ciivcji  a  .set  of  nl)  pivot  elements,  we  c.nn  now  directly 
produce  n  set  of  comp.Ttihlo  pivots  from  (he  ordered  incompntiblc  (nhle.  This 
ordere<l  compatible  sot  is  obtained  for  the  initial  starting  sot  at  (he  rof)(  of 
the  binary  search  tree.  A  child  sot  in  the  (roe  is  a  subset  of  it'^  parent  sot.  In 
this  context,  every  set  at  any  given  point  in  (he  tree  has  fewer  pivots  than 
the  root  set.  Such  a  set  conid  be  considered  as  a  starting  set  itself.  Provided 
ne  could  produce  the  correct  incompatible  table  for  (his  set,  we  could  gen¬ 
erate  its  corresponding  ordered  compatible  set  directly  from  (he  new  table. 

The  incompatible  table  for  a  given  starting  set,  S.,  is  the  original  table 
with  those  rows  and  columns  corresponding  tf)  the  pivots  absent  from  S. 
eliminated.  If  we  let  S  be  the  initial  set  of  all  pivot  candidates  in  the  sparse 
matrix  and  S.  be  an  arbitrary  starting  set  in  the  tree  ,  then  the  procedure  to 
obtain  the  ordered  compatible  set  for  S- ,  ronijiscl-  .  from  an  updated  and 
t)rdered  incompatible  table  can  be  represented  as; 
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1.  cotnpxel-  =  empty 

2.  =  S-S^ 

3.  for  j  =  n  down  (o  1  do 

•1.  begin 

5.  if  I  j  ‘  )  <  ben 

6.  begin 

7.  letnpset  =  iuiptbL  -  /ess 

8  teuipsei  =  tempsei  p)  conipset. 

9.  if  I  tevipset  =  empty  )  then 

10.  comp.fet^  =  compset.  +  (jj 

11.  end 

12.  end 

where  lesx  is  the  set  of  pivots  absent  from  S..  Line  5  allows  only  those 
(obimns  of  the  ineoinpat ible  table  whose  corresponding  pivot  j  is  in  S.  to  be 
tested  for  the  eomi)at ibilil y  relation.  Set  less  is  used  in  line  7  to  eliminate 
rows  (orrespoiiditig  to  the  absent  pivots  in  S..  compset-  holds  the  current  set 
of  com|)ntible  pivots.  A  check  for  a  now  pivot  being  compatible  with  tlutse 
already  in  compset .  is  made  in  line  9. 

I(  MOW  possible  to  produce  an  ordered  compatible  set  for  any  set  at  any 
arl)ilrary  point  in  the  tree  direttly  from  the  incompatible  table,  (.iven  a 
starting  set.  onr  method  of  [)rodncing  an  orriered  compatible  set  tends  to  gen¬ 
erate  a  large  set  of  low  .Markrmilz  sum.  Thus,  we  can  produce  a  number  of 
ordered  compatible  sets  fftr  many  starting  sets  at  different  points  in  the  tree 
and  choo<e  the  best  candidate  among  thent  to  reduce  the  matrix.  The  follow¬ 
ing'  t  hcf'rem  will  eliminat e  of  s<»me  of  the  redundant  work. 

O 


Theorem 

All  ordered  compatible  sets  derived  from  the  starting  sets  in  the  binary 
search  tree  with  level  L-I  or  less  are  included  in  Ihe  ordered  compatible  sets  gen¬ 
erated  from  the  sets  at  level  L  of  the  tree.  (i.e..  it  is  only  necessary  to  generate 
ordered  compatible  sets  for  starling  sets  at  level  L  to  cover  those  at  level  l<CL.I 


Troof 

Let  5  be  the  initial  starting  set  at  Ihe  root  of  the  binary  tree  consisting 

of  all  pivots.  . .  Let  5^,.  5j  be  the  left  and  right  children  of  5  .  Let 

compset  be  the  ordered  comi)atible  set  obtained  directly  from  the  incompati- 
ble  table  for  the  set  S  .  Similarly,  let  compset^  and  compset^  be  the  ordered 
compatible  sets  corresponding  to  and  Sj  respectively. 

.\  pivot  P .  can  split  a  set  S  ill: 

Px  S  and 

{set  of  incompatibles  with  Py  }  Q  S  ^  empty. 
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Assninc  /V  splits  S  int(;  and  ;  Ihon: 

.9  =  S  -  {sot  of  incoinpat  il)los  with  /’.  }  and 

There  are  two  cases  to  consider: 

i.  /’.  not  in  compnei 

The  table  corresjionding  to  Sj  consists  of  the  same  null  coinmns  and 
compatible  j)ivols  as  in  compaet  so: 

compaet  =  (owpsely 

ii.  /'.  €  roinp/tet 

'riicn  we  must  have: 

imptblj,  cowp/fct  =  empty 

since  P.  is  compatible  with  all  pivots  in  compael  .  In  this  case,  couipset^ 
obtained  from  5^  is  ecpial  to  cowpnet  .  We  know  P^.  is  in  the  set  and 
that  the  incomj'at  ible  table  for  Sq  is  the  same  as  the  table  for  the  parent 
set  with  those  rows  and  columns  corresponding  to  incornpat ibies  of  /*. 
eliminated.  1'hns,  all  the  compatible  information  which  resulted  in  pro¬ 
duction  of  cowpsef  is  transferred  from  the  ()arenf  set  S  to  .9^^  and  consc- 
(|uent  ly; 

cowpscl  -  cotnp!fe(^. 

The  above  argument  proves  that,  at  level  1,  one  of  the  sets  or  .9|  will 
produce  the  same  ordered  compatible  set  as  produced  by  its  parent  set.  This 
proof  holds  for  any  two  children  of  a  set.  In  other  words,  at  any  point  in  the 
tree,  an  ordered  compatible  set  corresponding  to  a  parent  set  is  reproduced 
by  one  f)f  its  children. 

Induction  on  level  verifies  that  generating  the  ordered  compatible  sets 
for  every  set  from  the  root  through  level  L  of  the  tree  does  not  produce  any 
more  inff)rmation  than  producing  the  ordered  compatible  sets  for  every  set  at 
level  Ij  (»nly. 

.As  a  consequence  of  the  theorem,  we  generate  all  the  sets  at  a  given  level 
in  the  binary  tree,  and  for  each  set,  we  produce  an  ordered  compatible  set 
from  the  ordered  incompatible  table.  Among  the  generated  compatible  sets 
we  choose  the  set  of  largest  size  and  lowest  Markowitz  sum  to  reduce  the 
matrix  and  call  the  the  resulting  set  the  eliminal ion  set. 

If  we  note  that  we  split  each  set  at  each  level  of  the  tree  for  a  given  pivot 
according  to  its  incompatibility  information,  then  generation  of  the  starting 
sets  at  different  levels  could  be  done  in  various  ways: 

i.  We  could  split  the  starting  sets  using  the  original  pivot  ordering  given  by 
the  input  sparse  matrix.  This  would  generate  completely  random 
results^ 
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ii.  Tlio  saniP  ordering  used  (o  order  the  ineompal ibie  (able  could  be  used  lo 
split  (be  sets.  This  left  to  right  ordering  does  not  seem  to  agree  with  our 
low  Markowitz  sum  requirement.  At  each  split  (level  in  the  tree),  we 
include  one  of  (he  pivots,  say  pj  ,  with  highest  number  of  incompatibles 
(highest  Markowitz  number)  in  (he  left  subtree.  This  inclusion  alsf» 
means  we  take  a  large  number  of  pivots  incompatible  with  out  of  the 
sets  in  the  left  subtree.  These  f)ivo(s  that  are  incompatible  with  have 
lower  Markowitz  numl)ers  than  Pj  and  could  tliemselves  be  com|)atil)le 
with  sf)ine  other  elements  in  the  set.  As  a  result,  (his  ordering  will  pro¬ 
duce  a  left  set  considerably  smalh'r  in  size  than  the  resulting  right  set. 
Moreover,  the  left  set  contains  pivots  of  higli  Markowitz  number  which 
would  produce  many  fills  if  used  to  reduce  the  matrix.  Therefore,  some 
of  the  large  compatible  sets  with  small  Markowitz  sums  cannot  be  gen¬ 
erated  from  one  of  the  sets  in  the  left  subtree  unless  we  search  very  deep 
in  the  tree.  In  this  case,  (he  desired  com[)atible  sets  would  be  in  one  of 
t he  right  subt rees. 

iii.  A  third  alternative  would  be  to  split  the  sets  with  pivots  in  increasing 
order  of  their  Markowitz  numbers.  Of  course,  in  this  case,  the  incompa¬ 
tibility  information  of  the  pivots  used  to  split  a  st.arting  set  is  taken  from 
(he  right  end  of  the  incompatible  table.  Thus  the  complete  incompatibil¬ 
ity  information  for  a  pivot  i  is  obtainefi  by  concatenating  the  row  and 
cfjlumn  i  of  (he  table.  'Phis  process,  seems  tf>  give  a  better  balance  to  the 
binary  tree  for  (he  first  few  levels  used  to  generate  the  starting  sets 
required  in  our  algorithm.  Furthermore,  it  has  the  property  that  does 
not  ignore  pivots  of  low  Markowitz  numbers. 

I'lie  high  level  description  of  this  nlg<»ri(hm  is  given  below; 

Program  Parallel  Pivoting 

calculate  Markowitz  numbers  of  pivots  in  the 

remaining  unreduced  matrix. 

SORT  pivots  in  decreasing  «>rder  of  Markowitz  numbers 
produce  all  starting  sets  at  level  I’LhA’ICF  taking  (he 
pivots  to  split  the  sets  from  the  root  t(»  IMdCVFL  in 
order  of  increasing  Markowitz  numbers. 

for  each  set  at  UFF\'1".F  pro<luce  an  ordered  compatible  set  from 
the  updated  ordered  incompatible  (able. 

atjiong  the  ordered  compatible  sets  generated  above  choose  the 
maximum  sized  set  with  minimum  Markowitz  sum  (Elimination  set). 

Here,  I’LEN’EF  is  a  preset  level  number  indicating  the  depth  of  the  tree  to  be 
searched.  The  algorithm  is  no  longer  exponential  in  time.  An  efficient  imple¬ 
mentation  of  the  required  sort  and  set  operations  are  important  factors  in 
efficient  execution  of  the  algorithm.  The  set  operations  used  in  the  construc¬ 
tion  of  the  incompatible  table  are  of  order  1  (adding  an  element  to  the  set  or 
a  test  for  membership).  The  incompatible  table  can  therefore  be  constructed 
in  time  nz,  where  nz  is  the  number  of  nonzero  elements  of  the  matrix.  Gen¬ 
eration  of  an  ordered  compatible  from  the  incompatible  table  requires  scan¬ 
ning  n  sets  corresponding  to  the  columns  of  the  table,  and  performing  inter¬ 
section  andMifferenco  operations  on  the  sets.  These  operations  are  of  order  n 
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wifli  n  cojisl.Tnt  fnclor  oqiinl  fo  tl»o  iiivrrso  of  tlu'  niiinhor  of  hits  per  eoin- 
piilcr  word.  I'lip  set  operations  are  tisually  iinpleineiit ed  in  macliine  language 
or  micro  code  and  tints  have  a  small  time  factor.  They  could  he  considered  to 
have  a  constant  time  (rather  than  order  of  n)  cfttnpared  to  the  time  taken  to 
execute  a  high  level  language  statement.  I’rctdnction  of  all  starting  sets  for 
level  ULlCVJoh  fakes  a  constant  time,  (ienerafion  of  an  ordered  comjtatihle 
for  each  starting  set  at  ULEVEL  takes  a  constant  times  n  as  explained  above. 
For  reasonable  values  of  III^EVEL,  all  ordered  compatible  sets  can  be  derived 
in  parallel  for  different  starting  sets.  In  the  next  section  we  will  see  that  good 
results  are  obtained  for  small,  constant  values  of  UEIOVICl.-  cf)mpared  to  n. 
'I'he  complexify  of  the  algorithm  is  bounded  above  by  the  sorting  algorithm. 
Tims,  employing  an  eflicient  parallel  sort  would  improve  the  performance  of 
f he  new  algorit hm. 

Balance  between  Parallelism  and  Fill-in  Generation 

Even  thotigh  the  above  procedure  tends  to  produce  large  sets  of  low  Mar¬ 
kowitz  sums,  we  still  could  optimize  the  generation  of  fill-ins  by  considering  a 
subset  of  the  elimination  set.  That  is,  there  could  still  be  .some  rooni  for  trad¬ 
ing  off  between  parallelism  and  fill  generation.  To  accomplish  this  task,  we 
need  to  yome  iij)  with  parameters  to  control  the  number  of  pivots  to  be  pro- 
(essed  in  parallel  and  the  number  of  fills  to  be  generated.  One  such  parame¬ 
ter  could  be  the  size  of  the  set  of  compatible  pivots.  By  allowing  a  ])ercentagp 
of  the  set  to  be  discarded,  we  can  control  fli(>  the  number  of  eotnpatible 
pivots  to  a  degree  that  does  not  limit  mir  parallel  work  by  too  mucit.  For 
(larity,  this  parameter  is  called  the  shrinkage  parameter  and  is  used  as  a 
lower  limit  to  shrink  the  elimination  set  by  a  percentage  of  its  size.  A 
different  parameter  could  be  an  upper  limit  on  the  size  of  the  elimination  set. 
'I’ll is  limit  would  allow  just  enough  work  to  keep  our  parallel  processes  busy. 
Of  course  shrinking  of  the  elimination  set  must  not  be  done  arbitrarily  by 
throwing  pivots  out  of  the  set.  In  general,  we  would  like  to  shrink  our  set  by 
discarding  pivots  that  would  cause  generation  of  many  fills.  Such  pivots  tend 
to  have  high  Markowitz  numbers.  We  already  have  pivots  ordered  according 
to  their  .Markowitz  numbers.  We  could  use  this  ordering  to  scan  pivots  with 
higliest  Markowitz  number  in  the  elimination  set  and  test  against  a  threshold 
v.tIuc.  If  pivots  with  Markowitz  numbers  greater  than  a  threshobi  exist  and 
if  our  shrinkage  |)arameter  allows,  they  are  discarded  from  the  elimination 
set.  Use  of  a  threshold  value  will  allow  us  not  to  shrink  a  set  thai  consists  of 
all  good  pivot  candidates  of  reasonably  low  Markowitz  numbers.  I'o  serve 
this  purpose,  the  threshold  value  should  be  set  in  comparison  with  low  and 
high  Markowitz  numbers  of  the  pivoting  elements  in  the  matrix.  Again  the 
ordered  Markowitz  numbers  of  pivots  can  be  used  to  set  such  a  threshold 
value  conveniently.  One  way  is  to  specify  a  fraction  of  candidates  to  be  dis¬ 
carded  from  the  elimination  set.  Consequently,  we  set  the  threshold  to  the 
Markowitz  number  of  a  pivot  in  a  specific  position  in  the  list  of  pivot  ele¬ 
ments  of  the  unreduced  matrix  (ordered  by  decreasing  Markow'itz  numbers). 
Any  pivot  above  this  point  in  the  ordered  list  is  considered  to  have  a  high 
Markowitz  number  and  therefore  is  a  candidate  for  being  discarded  from  the 
set.  and  any  pivot  below  this  point  is  considered  acceptable. 
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IMvols  in  the  (’limiiintion  sot  are  scannod  in  oidor  of  tlioir  liigliost  Mar- 
kouilz  nniiil)or.  If  a  pivot  with  Markowitz  niimhor  greater  than  the  Ihres- 
hold  exists  and  if  the  set  is  not  already  of  rninimiitn  size,  it  is  discarded  from 
the  set.  1'lie  process  is  repeated  until  cither  no  more  pivots  of  large  Mar¬ 
kowitz  nninbers  are  left  in  the  set  or  the  set  cannot  he  further  shrunk .  In  the 
next  section  we  j)resent  the  result  of  dilTerent  strategies  and  various  parame¬ 
ters  discussed  here  for  a  numher  of  test  matrices. 

Analysis  of  the  Results 

The  complexity  of  the  binary  tree  s<*arch  algoritlun  to  obtain  maximal 
cotiipat  ibie  sets  was  such  that  it  could  not  be  run  to  completion  for  a  38  by 
38  matrix.  To  verify  the  validity  of  oiir  heuristic  program,  we  performed 
every  analysis  described  in  this  section  on  the  small  test  matrices  of  Table 
1.1.  Recall  that  the  new  algorithm  produces  a  mimber  of  starting  sets  for  a 
given  level  (ridA’Idd  of  the  binary  search  tree.  For  each  starting  set,  an 
ord<'red  compatible  set  is  produced.  Among  the  generated  ordered  compati¬ 
ble  sets,  the  set  with  maximum  size  and  minimum  Nfarkowitz  sum  is  selected 
as  the  eliminatioii  set  at  that  parallel  step.  Two  alternative  orderings  for 
generation  of  starting  sets  at  (JLIA’FL  were  disctissed  earlier.  For  simplicity, 
ue  call  tjie  algorithm  to  reduce  a  sparse  matrix  by  compatible  pivots  using 
the  decreasing  order  of  Markowitz  numbers  for  starting  set  splitting, 
DCOMR.  Similarly,  the  algorithm  which  uses  the  increasing  order  of  Mar¬ 
kowitz  numbers  is  called  ICOMR. 

Detailed  information  produced  by  DC’OMP  and  K'OMP  are  presented  for 
three  spars('  matrices  in  Table  2.1.  CMIumn  one  of  the  table  gives  a  descrip¬ 
tion  of  the  sparse  matrix  under  consideration.  Column  2,  specifies  the  paral¬ 
lel  step.  Columns  3,  i,  and  5  give  the  number  of  compatible  pivots  in  the 
elimination  set,  its  Markowitz  sum  and  number  of  fill-ins  generated  at  each 
stcf)  for  program  DC'OMP.  Similar  information  is  summarized  in  the  next 
three  columns  for  program  ICOMP.  The  information  presented  here  is  for 
l'MC\'ICL=  l.  The  first  two  matrices  have  been  completely  analyzed  in  the 
j)revious  section  and  are  presented  here  to  show  the  validity  of  our  proposed 
algorithms.  If  is  interesting  to  see  that,  for  the  first  matrix,  DCOMP  pro¬ 
duced  exactly  the  same  results  as  the  complete  tree  search  program.  On  the 
other  hand,  ICX)MP  produced  different  results.  Even  though  ICOMP  pro¬ 
duces  a  smaller  compatible  set  in  the  first  step,  it  finds  larger  sets  in  the  next 
steps  and  reduces  the  same  number  of  rows  (i.e.,  21)  in  five  parallel  steps. 
IC'O.MP  generates  22  fills,  almost  half  the  number  produced  by  DCO.MP  (10) 
or  even  the  comi)lele  binary  tree  search  algorithm  (10).  The  same  behavior  is 
observed  from  the  second  24  by  21  matrix.  The  third  matrix  is  obtained  from 
the  circuit  of  an  8-bit  full  adder  and  is  a  Ml  by  144  matrix  with  GIG 
nonzeros.  Note  that  both  algorithms  produced  an  elimination  set  of  72  pivots 
in  the  first  step,  and  so,  half  of  the  matrix  can  be  reduced  in  parallel  in  one 
step.  In  this  case  the  advantage  of  ICOMP  over  DCOMP  is  not  significant. 

To  see  how  variation  of  depth  will  affect  the  resulting  compatible  sets, 
we  ran  both  programs  for  values  of  IMdCVEE  between  2  and  5,  for  a  number 
of  matrices.  These  results  are  summarized  in  Table  2.2.  Again  the  first 
coliMnn  describes  the  matrix.  The  second  column  specifies  ULE\'EE. 
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C  olumns  3  to  0  give  related  information  for  tlie  DCOMI’  program.  Here  tlie 
first  atid  tlie  third  columns  specify  numher  of  parallel  steps  taken  to  reduce 
the  matrix  and  number  of  rows  reduced  in  those  steps,  respectively.  The 
secf>nd  column  is  the  average  parallel  work  at  each  step  and  is  obtained  by 
dividing  total  number  of  rows  reduced  in  parallel  by  the  number  of  parallel 
steps.  The  fourth  column  gives  the  total  number  of  fill-ins  generated  by 
parallel  reduction.  The  next  four  columns  of  the  table  provide  similar  infor¬ 
mation  for  the  K'OMF’  program.  The  last  two  matrices  of  the  table  are  firo- 
duced  from  the  Sr.AR  program,  which  is  a  structural  analysis  program  [12]. 
These  two  matrices  have  a  iieculiar  block  structure.  Our  initial  objective  was 
to  study  sparse  matrices  arising  from  Sl’IC'lv  'I'hese  matrices  ordinarily  have 
a  random  sparsity  structure,  but  at  the'same  time,  the  limited  connectivity 
between  nodes  of  the  input  circuit  results  in  a  limited  number  of  nonzeros  per 
row/column.  1'he  .Sl'AR  matrices  will  firovide  some  insight  into  the  behavior 
of  our  heuristic  algorithms  for  a  wider  class  of  matrices. 

It  is  clear  from  the  table  that,  in  almost  every  case,  ICOMP  produces 
better  results  both  in  terms  of  number  of  rows  reduced  in  parallel  and 
number  of  fill-ins  generated.  As  was  expected,  DCOMP  finds  elimination  sets 
of  lower  Markowitz  sums  as  we  search  deeper  in  the  tree.  This  is  observed 
from  the- first  21  l)y  2f  fuatrix  and  from  the  last  SP.\R  generated  fiO")  by  fiOT) 
matrix.  In  the  first  matrix,  K’OMI’  produced  fills,  reducing  21  rows  in  5 
parallel  steps,  while  DCO.MP  generated  more  than  twice  the  number  of  fills 
and  reduced  20  rcovs  in  steps.  7'he  nurjiber  of  fills  decreases  for  DCOMP  as 
I 'l-IA'Iil-  is  increased,  while  ICOMP  takes  the  oi)posite  direction.  This  also 
shows  that  reasonably  acceptable  compatible  sets,  both  in  terms  of  size  and 
Markowitz  sum.  are  generated  for  small  values  of  ULRVEL  and  it  is  not 
necessary  to  search  very  deep  in  the  tree.  The  above  observations  hold  for 
every  matrix  presented  in  the  table,  except  the  78  by  78  matrix  produced  by 
the  SPAR  program.  This  matrix  does  not  have  characteristics  tyi)ical  of 
SPIC'E  generated  matrices;  but,  as  we  will  see  in  our  next  analysis,  acceptable 
results  are  produced  for  this  matrix  as  well.  Note  that  tlicre  are  cases  for  the 
DCO.MP  program  in  which  a  higher  average  parallelism  is  indicated  in  the 
table  than  for  ICOMP.  In  those  situations,  it  is  often  the  case  that  fewer 
r()ws  have  been  reduced  by  DCOMP  than  by  IC’OMP. 

The  remaining  analyses  arc  perf(»rmed  on  the  ICOMP  j)rogrnm  only, 
since  it  j)rodjices  better  results.  In  what  follows,  a  value  of  4  is  used  for 
Cl-IA  lOb.  The  next  step  is  to  study  the  effects  of  varying  the  parameters  pro- 
posed  earlier  to  obtain  a  balance  between  generation  of  fill-ins  and  the 
amount  of  parallel  work.  Results  are  summarized  in  Figures  2.3  to  2.G  for 
four  of  the  matrices  of  Table  2.2.  In  these  graphs,  four  dilferent  symbols  are 
used  to  represent  four  different  values  of  the  threshold  parameter.  Recall 
(hat  the  threshold  is  set  to  the  Markowitz  number  of  a  specific  pivot  in  the 
ordered  list  of  pivot  candidates.  On  the  graphs,  the  threshold  value  is  given 
as  a  fraction  of  the  pivoting  elements  in  the  remaining  unreduced  matrix, 
ordered  in  order  of  decreasing  Markowitz  numbers.  For  example  when  the 
threshold  is  1/3.  the  Markowitz  number  of  the  pivot  residing  in  the  1/3  point 
of  the  ordered  list  of  pivot  candidates  in  the  unreduced  matrix  is  oi)tained. 
Any  pivot  in  the  elimination  set  with  Markowitz  number  greater  than  this 
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value  is  a  candidate  to  be  discarded  from  the  set. 

The  graphs  present  information  about  the  number  of  generated  fill-ins 
versus  the  shrinkage  parameter.  In  each  case,  the  analysis  is  performed  for 
threshold  values  of  1/10,  1/3,  1/2,  and  2/3.  For  every  value  of  the  threshold, 
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oon)j)nlil)Io  sols  consist  of  pivoting  oloiiicnts  of  lower  Markowitz  numbers 
tlinn  (he  highest  one  tenth.  These  graphs  show  more  variation  with  the 
threshold  parameter,  even  (hough  a  higher  number  of  fill-ins  is  produced. 
Without  exception,  the  fewest  fill-ins  are  produced  for  the  largest  values  of 
the  threshold  and  the  shrinkage  parameter. 

The  number  of  fill-ins  produced  by  ICOMl’  compares  reasonably  with 
se(|uential  runs  on  the  same  matrices.  I  or  example,  the  sequential  run  on  the 
8-bit  full  adder  matrix  produced  160  fills,  and  K'OMf’  produced  106,  which  is 
an  increase  of  about  18*^0.  In  general,  as  expected,  the  number  of  fill-ins  pro¬ 
duced  by  ICOMl*  is  higher  than  the  serpH-ntial  results,  but  the  difference  is 
not  great. 

Conclusion 

.Solutif)n  of  sparse  systems  of  equations  is  essential  in  many  application 
programs.  Often  such  a  system  has  to  be  solved  repeatedly.  In  this  ])aj>er  we 
verified  that  in  sjiarse  matrices  arising  from  electronic  circuits  it  is  possible  to 
do  computations  on  many  diagonal  elements  simultaneously.  .\  complete 
analysis  of  some  test  matrices,  done  by  generating  all  maximal  compatible 
sets  of  pivot  elements,  indicate<l  the  existence  of  many  compatible  pivots  in 
these  matrices.  We  have  shown  our  test  matrices  do  not  become  full  during 
the  decomposition.  Furthermore,  it  was  shown  that  many  parallel  computa¬ 
tion  s(e()s  are  possible,  and  during  these  steps,  the  matrix  is  often  reduced 
completely.  The  competing  issues  of  parallel  pivoting  and  fill-in  generation 
have  been  studied,  and  we  verified  through  examples  (hat  it  is  possible  (o 
reduce  the  production  of  fill-ins  by  removing  some  of  the  parallel  pivot  candi¬ 
dates  from  the  elimination  set  on  (he  basis  of  high  Markowitz  numbers.  A 
heuristic  algorithm  was  then  proposed  to  produce  large  compatible  sels  of  low 
Markowitz  sums  by  a  combination  of  an  ordered  partial  tree  search  strategy 
and  generation  of  ordered  compat ibie  sets.  DilTerent  orderings  to  produce  the 
ordered  compatible  sets  were  suggeste<l,  and  their  advantages  and  disadvan¬ 
tages  were  discussed  and  verified  thrmigh  the  simulated  results.  A  number  of 
parameters  to  provide  a  balance  between  gener.ition  of  fill-ins  and  the 
amount  of  parallel  work  were  suggested,  and  their  effects  were  determined  in 
t he  simulat ed  results. 

The  incompatible  table  required  by  the  algorithm  can  be  constructed  in 
time  nz  (number  of  nonzero  elements  of  (he  matrix).  Production  of  starting 
sets  for  a  given  ri.I>\'nL  takes  a  constant  time.  For  ULfi^VKL  small  and  con¬ 
stant  compared  to  n,  generation  of  ordered  compatibles  from  starting  sets  is 
of  order  n  set  intersection  and  difference  o|)era(ions.  Assuming  elFicient 
implementation  of  the  set  operations  is  available,  the  heuristic  algorithm  has 
a  complexity  bounded  above  by  the  sorting  algorithm  required  in  the  pro¬ 
gram.  Thus,  employing  an  efficient  parallel  sort  program  woiild  improve  the 
total  performance  of  the  new  algorithm.  Nevertheless,  our  results  show  that 
many  compatible  pivots  are  produced  for  par.allel  reduction  of  the  sparse 
matrices,  and  the  process  can  be  repeated  until  the  matrix  is  almost  com¬ 
pletely  reduced.  In  cases  where  the  matrix  is  not  completely  reduced,  the 
remaining  submatrix  is  of  such  a  small  size  that  parallel  operations  have  little 
effect.  Significant  reduction  in  generation  of  fill-ins  is  obtained  by  varying 
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(ho  proposod  pnranu'(ers.  Moreover,  as  (he  resuK  of  (hose  pararn(( ers,  a 
l)o((or  halaiice  l)o(\veen  (lie  number  of  compa(ible  pivo(s  genera(o(l  at 
difTcrorU  s(o[)s  was  achieved,  while  the  reduction  in  parallel  work  proved  (o 
be  insignificant . 
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16  Abstract 

During  L/U  decomposition  of  a  sparse  matrix,  it  is  possible  to  perform 
computation  on  many  diagonal  elements  simultaneously.  Pivots  that  can  be 
processed  in  parallel  are  related  by  a  compatibility  relation  and  are  grouped 
in  a  compatible  set.  The  collection  of  all  maximal  compatibles  yields 
different  maximum  sip.ed  sets  of  pivots  that  can  be  processed  In  parallel. 
Generation  of  the  maximal  compatibles  Is  based  on  the  information  obtained 
from  an  incompatible  table.  This  table  provides  information  about  pairs  of 
Incompatible  pivots.  In  this  paper,  generation  of  the  maximal  compatibles  of 
pivot  elements  for  a  class  of  small  sparse  matrices  is  studied  first.  The 
algorithm  Involves  a  binary  tree  search  and  has  a  complexity  exponential  in 
the  order  of  the  matrix.  Different  strategies  for  selection  of  a  set  of 
compatible  pivots  based  on  the  Markowitz  criterion  are  Investigated.  The 
competing  Issues  of  parallelism  and  flll-ln  generation  are  studied  and  results 
are  provided.  A  technique  for  obtaining  an  ordered  compatible  set  directly 
from  the  ordered  incompatible  table  Is  given.  This  technique  generates  a  set 
of  compatible  pivots  with  the  property  of  generating  few  fills.  A  new 
huerlstlc  algorithm  is  then  proposed  that  combines  the  idea  of  an  ordered 
compatible  set  with  a  limited  binary  tree  search  to  generate  several  sets  of 
compatible  pivots  in  linear  time.  Finally,  an  elimination  set  to  reduce  the 
matrix  is  selected.  Parameters  are  suggested  to  obtain  a  balance  between 
parallelism  and  fill-ins.  Results  of  applying  the  proposed  algorithms  on 
several  large  application  matrices  are  presented  and  analyzed. 
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